Factor-Based Uyghur-Chinese Statistical Machine Translation
نویسندگان
چکیده
This paper is an initial explore to Uyghur-Chinese statistical machine translation. Uyghur and Chinese are very different from each other, the former is an agglutinative language with very productive inflectional and derivational word-formation processes, but the characters of the latter are almost hieroglyphics, morpheme processing doesn’t work at all. We integrate Uyghur additional information, such as, affixes, stems to statistical machine translation system, this is so-called factored model, which is an extension of the phrase-based approach. The experiments show that morphological strategies can effectively improve the performances of translation system.
منابع مشابه
A Phrase Table Filtering Model Based on Binary Classification for Uyghur-Chinese Machine Translation
In statistical machine translation, large amount of unreasonable phrase pairs in a phrase table can affect the decoding efficiency and the overall translation performance, especially in Uyghur-Chinese machine translation. In this paper, we present a novel phrase table filtering model based on binary classification, which consider differences between Uyghur and Chinese, and draw lessons from bin...
متن کاملResearch for Uyghur-Chinese Neural Machine Translation
The problem of rare and unknown words is an important issue in Uyghur-Chinese machine translation, especially using neural machine translation model. We propose a novel way to deal with the rare and unknown words. Based on neural machine translation of using pointers over input sequence, our approach which consists of preprocess and post-process can be used in all neural machine translation mod...
متن کاملUyghur-Chinese Translation Disambiguation Method Research Based on Knowledge Automatic-Acquisition
This thesis studies the disambiguation method in Uyghur-Chinese translation, and proposes the design philosophy of automatic-acquisition in translation label library aiming at the deficiency of disambiguation corpus in Uyghur. It refers to the existing Uyghur-Chinese bilingual dictionary, Chinese corpus and the Internet, and acquires the corresponding Chinese translation label examples to Uyghu...
متن کاملChinese-Uyghur Sentence Alignment: An Approach Based on Anchor Sentences
This paper, which builds on previous studies on sentence alignment, introduces a sentence alignment method in which some sentences are used as “anchors” and a two step procedure is applied. In the first step, some lexical information such as proper names, technical terms, numbers and punctuation marks, location information and length information are used to generate anchor sentences that satisf...
متن کاملRule Based Analysis of the Uyghur Nouns
This paper describes the implementation of a rule-based analyzer for Uyghur (spoken in Sin Kiang, China) Nouns. We hope this paper will give some contribution for advanced studies to the Uyghur Language in Machine Translation and Natural Language Processing. Like all Turkic languages, the Uyghur Language is an agglutinative language that has productive inflectional and derivational suffixes. In...
متن کامل